This project involved the analysis of breast cancer data derived from the TCGA-BRCA project using R. Multi-omics data (RNAseq STAR counts data, microRNA data proteomics and clinical data) of (495 primary solid) tumor samples was downloaded and processed using TCGABiolinks R package. The RNAseq data were divided into two matrices, one for the long noncoding RNAs and one for the mRNA.
To prepare the dataests for MOFA algorithm, we started with features quality control, normalization, and scaling. MOFA2 (Multi- Omics Factor Analysis) R package was used for multi-omics data integration, which uses an unsupervised approach to discover the patterns that drive variation across different molecular layers. Features included in MOFA analysis were (2281 for mRNA, 1572 for miRNA, 378 for Proteome and lncRNA for 2228).
To identify differentially enriched pathways, biological processes and functions, gene set enrichment analysis was performed for mRNA and Proteome using REACTOME and C5(GO: BP) from MSIGDB. On the other hand, overrepresentation analysis (ORA) was conducted on the miRNA and lncRNA. First, differential expression analysis was performed using DESeq2 Package, and the samples were grouped based on their position in factor one of MOFA, positive and negative groups. Then, ORA was performed using TAM-2 tools for miRNA and PANTHER web tool for lncRNA. The ORA analysis used REACTOME, GO: BP, and GO: MF.
#Set Working Directory
#Libraries
library(readxl)
library(readr)
library(vsn)
library(DESeq2)
library(tidyverse)
library(ComplexHeatmap)
library(ggfortify)
library(umap)
library(ggplot2)
library(ggrepel)
library(TCGAbiolinks)
library(ComplexUpset)
library(ggupset)
library(openxlsx)
##Loading MetaData & Samples
In this section we loaded the Samples from TCGA database Link: (https://portal.gdc.cancer.gov/) using the TCGAbiolinks package Link: (https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html)
#MetaData
RNA_TCGA <- read.delim("Sample_Sheets/RNA_SampleSheet.tsv" )
miRNA_TCGA <- read.delim("Sample_Sheets/miRNA_SampleSheet.tsv" )
Methylation_TCGA <- read.delim("Sample_Sheets/Methylation_SampleSheet.tsv" )
Protein_TCGA <- read.delim("Sample_Sheets/Protein_SampleSheet.tsv" )
We Found 1207 , 919 and 1231 TCGA-BRCA Samples in miRNA , Protein and RNA databases respectively. To figure out the common samples w did the following steps..
Intersect_RNA_miRNA <- intersect(RNA_TCGA$Case.ID , miRNA_TCGA$Case.ID )
Intersect_Meth_Prot <- intersect(Methylation_TCGA$Case.ID , Protein_TCGA$Case.ID)
barcodes <- intersect(Intersect_RNA_miRNA , Intersect_Meth_Prot)
629 Samples have all the four omics. By Sample ID we will subset the Omics metadata
df <- data.frame(Case.ID = barcodes)
dim(df)
## [1] 629 1
#Subsetting the Common Samples from the BulkRNA Samples
RNA_Samples <- RNA_TCGA[which(RNA_TCGA$Case.ID %in% df$Case.ID),]
table(RNA_Samples$Sample.Type)
##
## Metastatic Primary Tumor Solid Tissue Normal
## 5 640 86
barplot(table(RNA_Samples$Sample.Type) , space = 0 , horiz = F, main = "RNA Sample Types")
640 Primary Tumor Samples are our target, and this indicates that there
are some patients have more than one sample and others have metastatic
and adjasent normal extracted samples.
#Subsetting the Common Samples from the miRNA Samples
miRNA_Samples <- miRNA_TCGA[which(miRNA_TCGA$Case.ID %in% df$Case.ID),]
table(miRNA_Samples$Sample.Type)
##
## Metastatic Primary Tumor Solid Tissue Normal
## 5 639 78
barplot(table(miRNA_Samples$Sample.Type) , space = 0 , horiz = F , main = "mi-RNA Sample Types")
639 Primary Tumor Samples are our target, and this indicates that there
are some patients have more than one sample and others have metastatic
and adjasent normal extracted samples.
#Subsetting the Common Samples from the Protein Samples
Protein_Samples <- Protein_TCGA[which(Protein_TCGA$Case.ID %in% df$Case.ID),]
table(Protein_Samples$Sample.Type)
##
## Metastatic Primary Tumor Solid Tissue Normal
## 4 629 28
barplot(table(Protein_Samples$Sample.Type) , space = 0 , horiz = F , main = "Protein Sample Types")
629 Primary Tumor Samples are our target, and this indicates that there
are some patients have more than one sample and others have metastatic
and adjasent normal extracted samples.
#Load Metadata
load('metaData')
#Upset Plot
#Upset plot
upset_list <- list()
for(sample in unique(Clinical_data_ann$bcr_patient_barcode)){
exist <- c()
if (sample %in% unique(RNA_TCGA$Case.ID)){
exist <- c(exist, 'BulkRNA')
}
if(sample %in% unique(Protein_TCGA$Case.ID)){
exist <- c(exist, 'Protein')
}
if(sample %in% unique(Methylation_TCGA$Case.ID)){
exist <- c(exist, 'Methylation')
}
if(sample %in% unique(miRNA_TCGA$Case.ID)){
exist <- c(exist, 'miRNA')
}else{
exist <- c(exist, NA)
}
upset_list[[sample]] <- exist
}
upset_tibble <- data.frame(sample = unique(Clinical_data_ann$bcr_patient_barcode))
upset_tibble <- as_tibble(upset_tibble)
upset_tibble$list <- upset_list
upset_tibble %>% ggplot(aes(x=list)) +
geom_bar() +
xlab('Omics Type')+
scale_x_upset(n_intersections = 10)+
scale_x_mergelist(sep = "-") +
axis_combmatrix(sep = "-") +
scale_x_upset(order_by = "freq")
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
Only 629 Samples has four Omics, However the samples which will be
included in the analysis will be further sub-setted because the selected
metdata lacks some info about the breast cancer subtypes of some
samples.
rm(upset_list , upset_tibble , df , Methylation_TCGA , miRNA_TCGA , RNA_TCGA , Protein_TCGA)
rm(exist , Intersect_Meth_Prot , Intersect_RNA_miRNA , sample)
Save those Common Samples as txt format
#Save the Samples as txt file
write.table(barcodes, 'Common_Samples.txt' , quote = F , row.names = F , col.names = F)
metadata <- Clinical_data_ann
rename rows of metadata by sample IDs for further sub-setting
row.names(metadata) <- metadata$bcr_patient_barcode
We sub-setted the metadata to only 628 samples as one of the 629 is not avilable in this metadata
indexes=match(barcodes,rownames(metadata))
indexes= indexes[!is.na(indexes)]
metadata <- metadata[indexes , ]
Explore Breast Cancer sub-types
barplot(table(metadata$IMS), main = "Breast Cancer Subtypes in selected Samples")
This Barplot shows that selected samples include: Four breast cancer
sub-types (Basal: 129 samples , Her2: 91 samples , LumA: 197 samples ,
LumB: 130 samples)
Check if all samples are included in each Omic metadata
sum(metadata$bcr_patient_barcode %in% RNA_Samples$Case.ID)
## [1] 628
sum(metadata$bcr_patient_barcode %in% miRNA_Samples$Case.ID)
## [1] 628
sum(metadata$bcr_patient_barcode %in% Protein_Samples$Case.ID)
## [1] 628
#BulkRNA
We used TCGAbiolinks with following arguments to download the selected 628 Bulk-RNA data from GDC database
#Create the selected samples query
BulkRNA_query <- GDCquery(project = 'TCGA-BRCA',
data.category = 'Transcriptome Profiling',
data.type = 'Gene Expression Quantification',
sample.type = 'Primary Tumor',
barcode = barcodes)
#Download the samples files
GDCdownload(BulkRNA_query)
Read the downloaded files to form the Gene expression matrix
##### load the mRNA-Seq data #####
data.path="GDCdata/TCGA-BRCA/harmonized/Transcriptome_Profiling/Gene_Expression_Quantification"
files <- list.files(path=data.path,recursive=T, pattern = "tsv")
file=files[1]
filepath=file.path(data.path,files[1])
colnames= unlist( strsplit ( readLines(filepath ,n=2) [2] ,"\t"))
temp <- read.table(filepath, header=F,skip = 6)
names(temp)=colnames
head(temp)
exp=temp[,c(1,2,3)]
for(i in 1: length(files))
{
file=files[i]
file.id=strsplit(file,"/")[[1]][1]
filepath=file.path(data.path,files[i])
temp <- read.table(filepath, header=F,skip = 6)
colnames(temp)[4]=file.id
exp=cbind(exp,temp[4])
}
head(exp)
Select only protein coding transcripts
exp_pc <- exp%>%filter(gene_type == "protein_coding")
Check for duplicated transcripts that corresponds to the same gene
# check duplciation of of gene symbols?
x=duplicated(exp_pc$gene_name)
sum(x)
exp.data=exp_pc[,4:dim(exp_pc)[2]]
exp.data=apply(exp.data,2, as.numeric)
head(exp.data)
Remove Duplication by aggregation of the duplicated transcripts by taking there mean value expression into one gene
#### remove duplication by aggregation
exp.data.agg= aggregate(exp.data, list(exp_pc$gene_name),FUN=mean)
rownames(exp.data.agg)=exp.data.agg$Group.1
exp.data.agg=exp.data.agg[-1]
exp.data.agg=round(exp.data.agg)
file.ids=colnames(exp.data.agg)
View(exp.data.agg)
We renamed the samples with there TCGA sample ID instead of the file IDs
###### load the mrna sample sheets # sample sheets
pheno <- RNA_Samples
View(pheno)
colnames(pheno) = sub(" " , "." , colnames(pheno))
file.ids.pheno=pheno$File.ID
index.files=match(file.ids,file.ids.pheno)
pheno <- pheno[index.files, ]
table(pheno$`Sample.Type`)
file.ids.pheno=pheno$`File.ID`
index.files=match(file.ids,file.ids.pheno)
names(exp.data.agg)=pheno$Case.ID[index.files]
BulkRNA.data <- exp.data.agg[,metadata$bcr_patient_barcode]
We saved the read-count expression matrix as RDATA file
save(BulkRNA.data , file = "BulkRNA.data.RDATA")
Load
load("BulkRNA.data.RDATA")
we filtered Genes with zero variance across all samples to reduce dimensions of the Genes that have no variance explaining the difference between breast cancer sub types
genesVariance <- as.data.frame( apply( BulkRNA.data,
MARGIN = 1 ,
var ,
simplify = T))
names(genesVariance) = 'Variance'
vFilterGenes <- genesVariance %>% filter(Variance > 0 )
vFilterGenes <- row.names(vFilterGenes)
BulkRNA.data <- BulkRNA.data[vFilterGenes, ]
remove low quality genes (i.e. genes that have total expression lower than 10 in all 628 samples)
keep <- rowSums(BulkRNA.data) >= 10
BulkRNA.data <- BulkRNA.data[keep,]
We estimated size factors and variance stabilization transformation
dds = DESeqDataSetFromMatrix( countData = BulkRNA.data,
colData = metadata ,
design = ~gender)
dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)
View(ntd)
hist(genesVariance$Variance)
Filter genes with highest variance according to specific variance cut-off extracted from the shown histogram
Variance_Cutoff <- 15e+6
vFilterGenes <- genesVariance %>% arrange(desc(Variance)) %>% filter(Variance >= Variance_Cutoff)
vFilterGenes <- row.names(vFilterGenes)
ntd <- ntd[vFilterGenes,]
Log2 Normalization was done to normalize the data
BulkRNA.data.log= log2(ntd + 1)
we Scaled the matrix by Genes to be ready for linear combination model by MOFA
BulkRNA.data.log.scale <- scale(t(BulkRNA.data.log) , center = TRUE, scale = TRUE)
BulkRNA.data.log.scale <- as.data.frame(t(BulkRNA.data.log.scale))
We saved the normalized and scaled gene expression matrix to RDATA file
save(BulkRNA.data.log , BulkRNA.data.log.scale , metadata , file = "BulkRNA.data_.RDATA")
Load
load("BulkRNA.data_.RDATA")
Draw Plots of the data before and after normalization and scaling
options(repr.plot.width=20,repr.plot.height=8)
par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(BulkRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(BulkRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(BulkRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')
We saved the read-count gene expression matrix to be used for selecting
long non-coding RNA
save(exp , RNA_Samples , file = "ln_material.RDATA")
load("ln_material.RDATA")
#miRNA We used TCGAbiolinks with following arguments to download the selected 628 mi-RNA data from GDC database
miRNA_query <- TCGAbiolinks::GDCquery(project = 'TCGA-BRCA',
data.category = 'Transcriptome Profiling',
data.type = 'miRNA Expression Quantification',
sample.type = 'Primary Tumor',
barcode = barcodes)
GDCdownload(miRNA_query)
Read the downloaded files to form the miRNA expression matrix
data.path="GDCdata/TCGA-BRCA/harmonized/Transcriptome_Profiling/miRNA_Expression_Quantification"
files <- list.files(path=data.path,recursive=T, pattern = "txt")
file=files[1]
filepath=file.path(data.path,files[1])
colnames= unlist( strsplit ( readLines(filepath ,n=2) [1] ,"\t"))
temp <- read.table(filepath, header=T)
exp_mi=temp[,c(1)]
for(i in 1: length(files))
{
file=files[i]
file.id=strsplit(file,"/")[[1]][1]
filepath=file.path(data.path,files[i])
temp <- read.table(filepath, header=T)
colnames(temp)[2]=file.id
exp_mi=cbind(exp_mi,temp[2])
}
row.names(exp_mi) = exp_mi$exp_mi
exp_mi = exp_mi[-1]
file.ids=colnames(exp_mi)
#miRNA.data = exp_mi
file.ids.pheno=miRNA_Samples$File.ID
index.files=match(file.ids,file.ids.pheno)
names(exp_mi)=miRNA_Samples$Case.ID[index.files]
miRNA.data=exp_mi
indexes=match(metadata$bcr_patient_barcode,colnames(miRNA.data))
indexes= indexes[!is.na(indexes)]
miRNA.data <- miRNA.data[,indexes]
We saved the read count miRNA expression matrix as RDATA file
save(miRNA.data , file = "miRNA.data.RDATA")
Load
load("miRNA.data.RDATA")
We removed miRNA features with zero variance
miRNAVariance <- as.data.frame( apply( miRNA.data,
MARGIN = 1 ,
var ,
simplify = T))
names(miRNAVariance) = 'Variance'
vFiltermiRNA <- miRNAVariance %>% filter(Variance > 0 )
vFiltermiRNA <- row.names(vFiltermiRNA)
miRNA.data <- miRNA.data[vFiltermiRNA, ]
We calculated the estimated size factors and variane stabilizing transformation for removing of the biological and techniqal sequencing errors
dds = DESeqDataSetFromMatrix( countData = miRNA.data,
colData = metadata[colnames(miRNA.data),],
design = ~gender)
dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)
we log normalized the miRNA expression matrix
miRNA.data.log= log2(ntd + 1)
We scaled the miRNA normalized expression matrix by feature to be introduced to MOFA
miRNA.data.log.scale <- scale(t(miRNA.data.log) , center = TRUE, scale = TRUE)
miRNA.data.log.scale <- as.data.frame(t(miRNA.data.log.scale))
We saved the normalized and scaled matrices as RDATA file
save(miRNA.data.log , miRNA.data.log.scale , metadata , file = "miRNA.data_.RDATA")
Load
load("miRNA.data_.RDATA")
options(repr.plot.width=20,repr.plot.height=8)
par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(miRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(miRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(miRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')
#lnRNA We filtered the read count gene expression matrix for only lnc RNA
exp_lnc <- exp%>%filter(gene_type == "lncRNA")
We checked for duplicated transcripts
# check duplciation
x=duplicated(exp_lnc$gene_name)
sum(x)
We aggregated the duplicated transcripts by taking there mean values
exp.data=exp_lnc[,4:dim(exp_lnc)[2]]
exp.data=apply(exp.data,2, as.numeric)
View(exp.data)
#### remove duplication by aggregation
exp.data.agg= aggregate(exp.data, list(exp_lnc$gene_name),FUN=mean)
rownames(exp.data.agg)=exp.data.agg$Group.1
exp.data.agg=exp.data.agg[-1]
exp.data.agg=round(exp.data.agg)
file.ids=colnames(exp.data.agg)
View(exp.data.agg)
We renamed the lncRNA expression matrix by sample IDs instead of File IDs
###### load the mrna sample sheets # sample sheets
pheno <- RNA_Samples
View(pheno)
colnames(pheno) = sub(" " , "." , colnames(pheno))
file.ids.pheno=pheno$File.ID
index.files=match(file.ids,file.ids.pheno)
pheno <- pheno[index.files, ]
table(pheno$`Sample.Type`)
file.ids.pheno=pheno$`File.ID`
index.files=match(file.ids,file.ids.pheno)
names(exp.data.agg)=pheno$Case.ID[index.files]
lncRNA.data <- exp.data.agg[,metadata$bcr_patient_barcode]
Load
load("lncRNA.data_.RDATA")
We removed the lncRNA features with zero variance
lncRNAVariance <- as.data.frame( apply( lncRNA.data,
MARGIN = 1 ,
var ,
simplify = T))
names(lncRNAVariance) = 'Variance'
vFilterlncRNA <- lncRNAVariance %>% filter(Variance > 0 )
vFilterlncRNA <- row.names(vFilterlncRNA)
lncRNA.data <- lncRNA.data[vFilterlncRNA, ]
We estimated the size factors and calculated the variance stabilizing transformation
dds = DESeqDataSetFromMatrix( countData = lncRNA.data,
colData = metadata ,
design = ~race)
dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)
We selected the lncRNA features with highest variance according to specific cut-off to be with suitable dimension while running MOFA algorism
Variance_Cutoff <- 2000
vFilterlncRNA <- lncRNAVariance %>% arrange(desc(Variance)) %>% filter(Variance >= Variance_Cutoff)
vFilterlncRNA <- row.names(vFilterlncRNA)
ntd <- ntd[vFilterlncRNA,]
We normalized the lnRNA expression matrix after filtration
lncRNA.data.log= log2(ntd + 1)
We scaled the matrix by features
lncRNA.data.log.scale <- scale(t(lncRNA.data.log) , center = TRUE, scale = TRUE)
lncRNA.data.log.scale <- as.data.frame(t(lncRNA.data.log.scale))
We saved the lnRNA normalized and scaled data to RDATA file
save(lncRNA.data , lncRNA.data.log , lncRNA.data.log.scale , file = "lncRNA.data_.RDATA")
options(repr.plot.width=20,repr.plot.height=8)
par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(lncRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(lncRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(lncRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')
#Proteins We created GDCquery for protein samples by using TCGAbiolinks R package
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Proteome Profiling",
access = "open",
legacy = FALSE,
barcode = barcodes,
platform = "RPPA",
data.type = "Protein Expression Quantification" ,
sample.type = "Primary Tumor")
View(getResults(query))
we run GDC download and prepare to form the protein expression matrix
GDCdownload(query)
We saved the normalized and scaled protein matrix as RDATA file
Protein.data.log.scale <- GDCprepare (query, save.filename = "RPPA_TCGA-BRCA", save = TRUE)
save(Protein.data.log.scale, file = "Protein.data.log.scale_.RDATA")
As each feature in the protein matrix is protein ID we annotated each feature into Gene ID for further enrichment analysis by using AGID annotation data base to select the corresponding gene name
# read the loaded annotation for AGID
AGID <- read_delim("AGID.tsv", delim = "\t", escape_double = FALSE,
trim_ws = TRUE)
AGID_ <- AGID %>% select(AGID , gene_name)
#1. set AGID name to one gene
AGID_$gene_name<- sub ("/.*", "", AGID_$gene_name)
#2. set rowname for RPPA
Protein.data.log.scale<- merge(Protein.data.log.scale , AGID_ , by= "AGID")
We removed the duplicated gene names by taking their mean values
Protein.data.log.scale=Protein.data.log.scale[,6:dim(Protein.data.log.scale)[2]]
Protein.data.log.scale_num=apply(Protein.data.log.scale,2, as.numeric)
View(Protein.data.log.scale_num)
#### remove duplication by aggregation
Protein.data.log.scale_agg= aggregate(Protein.data.log.scale_num, list(Protein.data.log.scale$gene_name),FUN=mean)
Protein.data.log.scale_agg <- Protein.data.log.scale_agg[-dim(Protein.data.log.scale_agg)[2]]
rownames(Protein.data.log.scale_agg)=Protein.data.log.scale_agg$Group.1
Protein.data.log.scale_agg=Protein.data.log.scale_agg[-1]
View(Protein.data.log.scale_agg)
We Ckecked the protein feature variances
num.mat <- as.matrix( Protein.data.log.scale_agg)
vars<- rowVars(num.mat, na.rm = TRUE)
We Detectd variances equal to zero or NA, and remove them from the main matrix
zero_var<- which( vars== 0 | is.na(vars))
length (zero_var)
sum (vars == 0)
Now we have 22 feature with zero variance
num.mat<- num.mat[ - zero_var, ]
Missings imputation We Computed all missinigness rate in proteins
#all_elmnts
nrow(num.mat) * ncol(num.mat)
# NAs_elmnts
sum(is.na(num.mat))
# # all missingness percentage
sum(is.na(num.mat))/(nrow(num.mat) * ncol(num.mat))
round(sum(is.na(num.mat))/(nrow(num.mat) * ncol(num.mat)) *100,1)
so data with 7.6 missing data
we Computed missinigness rate in each feature
missd_features<- rowSums( is.na(num.mat))/ nrow(num.mat)*100
sum (missd_features> 50)
quantile(missd_features)
We visulalized missings rates
## draw histogram for missingness rate for each feature
h=hist( missd_features,
breaks=10,
main="",
xlab="percentage of missingness")
# add counts to each break
text(h$mids,h$counts,labels=h$counts, adj=c(0.5, -0.5))
# the cutoff will be 50%
abline(v = 50)
good.inx=apply(is.na(num.mat), 1, sum)/ncol(num.mat) <0.2
sum(good.inx)
num.mat.good<- num.mat[good.inx, ]
library(impute)
Here we have no samples with more than 50% missing rate we imputed missingness by K nearest neighbour algorism by using impute.knn function fro impute R package
num.mat.imputed<- impute.knn(num.mat.good)$data
Confirm that we dot have missed data
sum (rowSums(is.na(num.mat.imputed)))
Protein.data.log.scale <- num.mat.imputed
Protein.data.log.scale <- scale(t(Protein.data.log.scale) , center = TRUE, scale = TRUE)
Protein.data.log.scale <- as.data.frame(t(Protein.data.log.scale))
We renamed the samples with sample IDs
col_names <- c()
for(name in colnames(Protein.data.log.scale)){
x <- str_split(name , pattern = "-")
X <- paste0(x[[1]][1] ,"-" ,x[[1]][2],"-" , x[[1]][3])
col_names = c(col_names , X)
}
colnames(Protein.data.log.scale) <- col_names
We saved the normalized and scaled protein matrix as RDATA file
save(Protein.data.log.scale , file = "Protein_data_.RDATA")
Load
load("Protein_data_.RDATA")
#MOFA Further sub-setting for the matrices will be through samples which are corresponding to breast cancer sub-types from the metadata (not adjacent normal samples)
FilteredMetaData <- as.data.frame(metadata %>% filter (IMS != 'Normal' & !is.na(IMS)))
FilteredMetaData <- FilteredMetaData[colnames(miRNA.data.log.scale),]
FilteredMetaData <- FilteredMetaData[complete.cases(FilteredMetaData),]
table(FilteredMetaData$IMS)
We Sub-setted the four matrices and prepared them for MOFA
miRNA <- miRNA.data.log.scale[,row.names(FilteredMetaData)]
miRNA_row.names <- row.names(miRNA)
miRNA <- apply(miRNA,2, as.numeric)
row.names(miRNA) = miRNA_row.names
RNA <- BulkRNA.data.log.scale[,row.names(FilteredMetaData)]
RNA_row.names <- row.names(RNA)
RNA <- apply(RNA,2, as.numeric)
row.names(RNA) = RNA_row.names
lncRNA <- lncRNA.data.log.scale[,row.names(FilteredMetaData)]
lncRNA_row.names <- row.names(lncRNA)
lncRNA <- apply(lncRNA,2, as.numeric)
row.names(lncRNA) = lncRNA_row.names
Protein <- Protein.data.log.scale[,row.names(FilteredMetaData)]
Protein_row.names <- row.names(Protein)
Protein <- apply(Protein,2, as.numeric)
row.names(Protein) = Protein_row.names
library(MOFA2)
library(MOFAdata)
library(data.table)
Load the MOFA Object
load('MOFA_BRCA.RDATA')
upload the matrices to a list
Breast_data <- list(Genes = RNA,
miRNA = miRNA,
lncRNA = lncRNA,
Protein = Protein)
Create and visualize MOFA object
MOFAobject <- create_mofa(Breast_data)
plot_data_overview(MOFAobject)
Select MOFA parameters for the model (30 factors are selected after
trying different numbers as the best estimaed variance by each omic)
data_opts <- get_default_data_options(MOFAobject)
model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 30
train_opts <- get_default_training_options(MOFAobject)
train_opts$convergence_mode <- "slow"
train_opts$seed <- 42
train_opts$maxiter <- 1000
MOFAobject <- prepare_mofa(MOFAobject,
data_options = data_opts,
model_options = model_opts,
training_options = train_opts
)
MOFAobject <- run_mofa(MOFAobject)
Add metadata to the MOFA Object
Total_pheno <- metadata[row.names(FilteredMetaData),]
Total_pheno$sample <- Total_pheno$bcr_patient_barcode
samples_metadata(MOFAobject) <- Total_pheno
Save the MOFA object
save(MOFAobject , file = "MOFA_BRCA.RDATA")
Plot Factor correlation
plot_factor_cor(MOFAobject)
Non of the 30 factors are higly correlated to each other
Plot variance ration explaind by each factor accros the four omics
plot_variance_explained(MOFAobject, max_r2=10)
Factors 1 and 3 explain the variance using the four omics with the
highest ration for Genes and lncRNA. while factor 2 is specific for
Genes and lncRNA features. Factors 4 and 5 only explains variance bwteen
samples by using the Protein features. other factors explains lower
ratios.
Variance_Explained <- MOFAobject@cache$variance_explained$r2_per_factor
Variance_Explained <- as.data.frame(Variance_Explained)
Variance_Explained
## group1.Genes group1.miRNA group1.lncRNA group1.Protein
## Factor1 12.016674731 2.56949929 9.888162736 5.47541969
## Factor2 12.708122032 0.41444172 5.629247491 0.69203583
## Factor3 6.806844119 2.13510060 3.611907712 2.83954086
## Factor4 0.001782687 0.00173897 0.001106563 14.62200619
## Factor5 0.045955985 0.18939580 0.046665365 10.62746466
## Factor6 4.416843067 1.00028579 3.001321935 1.53409754
## Factor7 2.154266082 0.77097387 1.755995737 1.86625015
## Factor8 2.520651096 0.57101505 2.016878282 0.99117344
## Factor9 1.966321466 0.55410141 1.464950856 0.68734177
## Factor10 2.749176068 0.18430931 1.207922621 0.16068761
## Factor11 1.595826453 0.47570821 1.523711294 0.68442115
## Factor12 1.059913447 0.65292315 0.849573109 1.51030911
## Factor13 1.389206660 0.36780472 1.893903902 0.37126287
## Factor14 0.032076359 2.48546806 0.031600010 0.39594611
## Factor15 0.616893772 0.84118401 0.457211354 0.82724650
## Factor16 1.062840268 0.27068094 0.923513367 0.37651265
## Factor17 1.840139533 0.12602967 0.476650149 0.17629967
## Factor18 0.362660524 0.17904999 1.817674746 0.22367164
## Factor19 1.118038532 0.14845793 0.841035482 0.42597099
## Factor20 0.562255765 0.96154716 0.569168003 0.27639691
## Factor21 0.710417102 0.24845109 0.849524529 0.49881563
## Factor22 0.719500944 0.37399960 0.596665642 0.38498971
## Factor23 1.003499944 0.21267797 0.651121724 0.18532291
## Factor24 0.842638748 0.14746152 0.759709395 0.26894959
## Factor25 0.755931575 0.21840670 0.614697301 0.39581000
## Factor26 0.859126156 0.22593288 0.730424815 0.11567401
## Factor27 1.098277816 0.18818156 0.327304423 0.14161400
## Factor28 0.608015270 0.32159671 0.624949379 0.18530124
## Factor29 0.635437267 0.28986551 0.679502876 0.08282037
## Factor30 0.579915085 0.18822674 0.535301945 0.37484299
plot_factors(MOFAobject,
factors = 1:3,
color_by = 'IMS'
)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
plot_variance_explained(MOFAobject, plot_total = T)[[2]]
As total variance explained, Genes and priteins explain the highest
variance among the selected samples.
plot_factor(MOFAobject,
factors = c(1,2,3,4),
color_by = "IMS"
)
## Warning: `fct_explicit_na()` was deprecated in forcats 1.0.0.
## ℹ Please use `fct_na_value_to_level()` instead.
## ℹ The deprecated feature was likely used in the MOFA2 package.
## Please report the issue at <https://github.com/bioFAM/MOFA2>.
Factor 1 explains the variance between the Basal breast cancer subtype
and the Luminal ones. while other factors cannot explaint the variance
between the selected subtypes.
plot_factor(MOFAobject,
factors = c(5,6,7,8),
color_by = "IMS"
)
plot_data_heatmap(MOFAobject,
view = "miRNA",
factor = 1,
features = 100,
cluster_rows = TRUE, cluster_cols = TRUE,
show_rownames = FALSE, show_colnames = FALSE,
scale = "none"
)
plot_data_heatmap(MOFAobject,
view = "Genes",
factor = 1,
features = 50,
cluster_rows = FALSE, cluster_cols = TRUE,
show_rownames = TRUE, show_colnames = TRUE,
scale = "none"
)
plot_data_heatmap(MOFAobject,
view = "Protein",
factor = 1,
features = 50,
cluster_rows = FALSE, cluster_cols = TRUE,
show_rownames = TRUE, show_colnames = TRUE,
scale = "none"
)
plot_data_heatmap(MOFAobject,
view = "lncRNA",
factor = 1,
features = 50,
cluster_rows = FALSE, cluster_cols = TRUE,
show_rownames = TRUE, show_colnames = TRUE,
scale = "none"
)
Total_pheno <- as.data.frame(MOFAobject@samples_metadata)
colnames(MOFAobject@samples_metadata)
## [1] "X" "bcr_patient_barcode"
## [3] "type" "age_at_initial_pathologic_diagnosis"
## [5] "gender" "race"
## [7] "ajcc_pathologic_tumor_stage" "clinical_stage"
## [9] "histological_type" "histological_grade"
## [11] "initial_pathologic_dx_year" "menopause_status"
## [13] "birth_days_to" "vital_status"
## [15] "tumor_status" "last_contact_days_to"
## [17] "death_days_to" "cause_of_death"
## [19] "new_tumor_event_type" "new_tumor_event_site"
## [21] "new_tumor_event_site_other" "new_tumor_event_dx_days_to"
## [23] "treatment_outcome_first_course" "margin_status"
## [25] "residual_tumor" "OS"
## [27] "OS.time" "DSS"
## [29] "DSS.time" "DFI"
## [31] "DFI.time" "PFI"
## [33] "PFI.time" "Redaction"
## [35] "IMS" "ethnicity"
## [37] "Ethnicity_reported" "Ethnicity_reported_simple"
## [39] "ICRscore" "HML.ICR.Cluster"
## [41] "ICR_ES" "IMS_Mathews"
## [43] "Assigned_Ethnicity" "Assigned_Ethnicity_simplified"
## [45] "sample" "group"
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:IRanges':
##
## desc
## The following object is masked from 'package:stats':
##
## filter
Statistical Anova test shows that factor one also significantly correlated with the race of the samples which may affect the variance explained by factor 1 due to biological differences (Can be better visualized)
Zmatrix <- as.data.frame(MOFAobject@expectations$Z)
for(i in 1){
df <- data.frame(factor = as.numeric(Zmatrix[,i]) , obj = Total_pheno[,"race"])
Ttest <- df %>% anova_test(factor~obj)
print(Ttest)
}
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 obj 3 491 12.735 5e-08 * 0.072
Anova Test shown significant correlation between race and samples values in Factor1
Zmatrix <- as.data.frame(MOFAobject@expectations$Z)
for(i in 1){
df <- data.frame(factor = as.numeric(Zmatrix[,i]) , obj = Total_pheno[,"IMS"])
Ttest <- df %>% anova_test(factor~obj)
print(Ttest)
}
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 obj 3 491 573.403 5.59e-160 * 0.778
Anova Test shown significant correlation between breast cancer subtypes and samples values in Factor1
MOFAobject@samples_metadata$age_diagnosis <-
MOFAobject@samples_metadata$age_at_initial_pathologic_diagnosis
correlate_factors_with_covariates(MOFAobject,
covariates = c("age_diagnosis","birth_days_to","last_contact_days_to"),
#plot="log_pval"
plot = "r"
)
## Warning in correlate_factors_with_covariates(MOFAobject, covariates =
## c("age_diagnosis", : There are non-numeric values in the covariates data.frame,
## converting to numeric...
correlate_factors_with_covariates(MOFAobject,
covariates = c("age_diagnosis","birth_days_to","last_contact_days_to" ),
plot="log_pval"
#plot = "r"
)
## Warning in correlate_factors_with_covariates(MOFAobject, covariates =
## c("age_diagnosis", : There are non-numeric values in the covariates data.frame,
## converting to numeric...
Factor1 samples are slightly correlated with days to birth and age at
diagnosis
p <- plot_factors(MOFAobject,
factors = c(1,3),
color_by = "IMS",
dot_size = 2.5,
show_missing = T
)
p <- p +
#geom_hline(yintercept=0.1, linetype="dashed") +
geom_vline(xintercept=(-0.2), linetype="dashed")+
geom_vline(xintercept=(1.2), linetype="dashed")
print(p)
Factor1 against Factor3 reveals that Factor1 can also explain variance
between Basal and Her2 subtypes for future analysis
#Genes Results
plot_top_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 20,
sign = 'positive',# Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
plot_top_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 20,
sign = "negative",
scale = F # Scale weights from -1 to 1
)
#miRNA Results
plot_top_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 20,
sign = 'positive',# Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
plot_top_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 20,
sign = "negative",
scale = T # Scale weights from -1 to 1
)
#Protein Results
plot_top_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 20,
sign = 'positive',# Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
plot_top_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 20,
sign = "negative",
scale = T # Scale weights from -1 to 1
)
#Enrichment
library(biomaRt)
RNA_Features <- features_names(MOFAobject)[["Genes"]]
lncRNA_Features <- features_names(MOFAobject)[["lncRNA"]]
miRNA_Features <- features_names(MOFAobject)[["miRNA"]]
Protein_Features <- features_names(MOFAobject)[["Protein"]]
utils::data(reactomeGS)
utils::data("MSigDB_v6.0_C2_human")
#Genes - MOFA - Reactome
Gene_ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
attributes<- listAttributes (Gene_ensembl)
Genes_ensembl_IDs <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "external_gene_name",
values = RNA_Features,
mart = Gene_ensembl)
indexes <- match(colnames(reactomeGS) , Genes_ensembl_IDs$ensembl_gene_id)
dim(reactomeGS)
sum(!is.na(indexes))
indexes <- !is.na(indexes)
reactome <- reactomeGS[,indexes]
indexes <- match(colnames(reactome) , Genes_ensembl_IDs$ensembl_gene_id)
col_names = Genes_ensembl_IDs$external_gene_name[indexes]
colnames(reactome) <- col_names
save(reactome , file = "reactomeDB.RDATA")
load("reactomeDB.RDATA")
#MOFA - Genes
# GSEA on positive weights, with default options
res.positive <- run_enrichment(MOFAobject,
feature.sets = reactome,
view = "Genes",
sign = "positive"
)
# GSEA on negative weights, with default options
res.negative <- run_enrichment(MOFAobject,
feature.sets = reactome,
view = "Genes",
sign = "negative"
)
plot_enrichment(res.positive,
factor = 1,
max.pathways = 15,
alpha = 0.25
)
plot_enrichment(res.negative,
factor = 1,
max.pathways = 15,
alpha = 0.25
)
#Final Excel Sheet
OUT <- createWorkbook()
#Genes - GSEA - Software
#Prpare the gene set
GSEA_input <- as.data.frame(MOFAobject@expectations$W$Genes[,'Factor1'])
colnames(GSEA_input) <- 'Weights'
#Rename the Genes to remove _Genes
indexes<- which(grepl(pattern = "_Genes" , x = row.names(GSEA_input) ))
Genes_toBeReplaced <- row.names(GSEA_input)[indexes]
Genes_toBeReplaced <- gsub(pattern = "_Genes" , "" , Genes_toBeReplaced)
row.names(GSEA_input)[indexes] <- Genes_toBeReplaced
#Add Genes Column
GSEA_input$Genes <- row.names(GSEA_input)
GSEA_input <- GSEA_input[,c('Genes' , 'Weights')]
GSEA_input <- GSEA_input %>% arrange(desc(Weights))
write.table(GSEA_input , sep = "\t",file = 'GSEA_input.rnk' , quote = FALSE , row.names = FALSE)
Reactome
GSEA_positive_Output_Reactome <- read.delim("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/gsea_report_for_na_pos_1679677635779.tsv") %>% filter(NOM.p.val <= 0.05)%>% dplyr::arrange(NOM.p.val) %>% dplyr::select(NAME , NOM.p.val)
ggplot(GSEA_positive_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
addWorksheet(OUT , 'GSEA_positive_Output_Reactome')
writeData(OUT, sheet = "GSEA_positive_Output_Reactome", x = GSEA_positive_Output_Reactome)
Select Pathways
df <- GSEA_positive_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_INTERFERON_GAMMA_SIGNALING" ,
"REACTOME_KERATINIZATION",
"REACTOME_ANTIGEN_PROCESSING_CROSS_PRESENTATION",
"REACTOME_NEUTROPHIL_DEGRANULATION",
"REACTOME_INFLUENZA_INFECTION",
"REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM",
"REACTOME_COLLAGEN_DEGRADATION",
"REACTOME_DEGRADATION_OF_THE_EXTRACELLULAR_MATRIX",
"REACTOME_CELL_CYCLE_MITOTIC",
"REACTOME_SARS_COV_2_MODULATES_HOST_TRANSLATION_MACHINERY",
"REACTOME_SARS_COV_1_MODULATES_HOST_TRANSLATION_MACHINERY",
"REACTOME_DETOXIFICATION_OF_REACTIVE_OXYGEN_SPECIES",
"REACTOME_SARS_COV_2_HOST_INTERACTIONS",
"REACTOME_HOST_INTERACTIONS_OF_HIV_FACTORS")
indexes <- match(selected_pathways , df[,'NAME'])
GSEA_positive_Output_Reactome_Selected <- df[indexes,]
ggplot(GSEA_positive_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
GSEA_negative_Output_Reactome <- read.delim("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/gsea_report_for_na_neg_1679677635779.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(GSEA_negative_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
addWorksheet(OUT , 'GSEA_negative_Output_Reactome')
writeData(OUT, sheet = "GSEA_negative_Output_Reactome", x = GSEA_negative_Output_Reactome)
Select Pathways
df <- GSEA_negative_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_SPHINGOLIPID_METABOLISM" ,
"REACTOME_PI3K_AKT_SIGNALING_IN_CANCER",
"REACTOME_METABOLISM_OF_LIPIDS",
"REACTOME_NEGATIVE_REGULATION_OF_THE_PI3K_AKT_NETWORK")
indexes <- match(selected_pathways , df[,'NAME'])
GSEA_negative_Output_Reactome_Selected <- df[indexes,]
ggplot(GSEA_negative_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
C5goBp
GSEA_positive_Output_C5 <- read.delim("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/gsea_report_for_na_pos_1679677560989.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(GSEA_positive_Output_C5 , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('GO Biological Processes')
addWorksheet(OUT , 'GSEA_positive_Output_C5')
writeData(OUT, sheet = "GSEA_positive_Output_C5", x = GSEA_positive_Output_C5)
Select Biological Processes
df <- GSEA_positive_Output_C5
colm <- "NAME"
selected_BP <- c("GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE" ,
"GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON",
"GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION",
"GOBP_HUMORAL_IMMUNE_RESPONSE",
"GOBP_REGULATION_OF_SUBSTRATE_ADHESION_DEPENDENT_CELL_SPREADING",
"GOBP_CYTOKINE_PRODUCTION",
"GOBP_DEFENSE_RESPONSE_TO_OTHER_ORGANISM",
"GOBP_INNATE_IMMUNE_RESPONSE",
"GOBP_RESPONSE_TO_MOLECULE_OF_BACTERIAL_ORIGIN",
"GOBP_INFLAMMATORY_RESPONSE",
"GOBP_POSITIVE_REGULATION_OF_IMMUNE_SYSTEM_PROCESS",
"GOBP_CELL_ADHESION",
"GOBP_ANTIMICROBIAL_HUMORAL_IMMUNE_RESPONSE_MEDIATED_BY_ANTIMICROBIAL_PEPTIDE",
"GOBP_COLLAGEN_METABOLIC_PROCESS",
"GOBP_TUMOR_NECROSIS_FACTOR_SUPERFAMILY_CYTOKINE_PRODUCTION",
"GOBP_POSITIVE_REGULATION_OF_CYTOKINE_PRODUCTION",
"GOBP_DEFENSE_RESPONSE_TO_GRAM_POSITIVE_BACTERIUM",
"GOBP_EXTRACELLULAR_MATRIX_DISASSEMBLY",
"GOBP_DNA_REPAIR")
indexes <- match(selected_BP , df[,colm])
GSEA_positive_Output_C5_Selected <- df[indexes,]
ggplot(GSEA_positive_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
GSEA_negative_Output_C5 <- read.delim("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/gsea_report_for_na_neg_1679677560989.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(GSEA_negative_Output_C5 , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 4)) +
xlab('GO Biological Processes')
addWorksheet(OUT , 'GSEA_negative_Output_C5')
writeData(OUT, sheet = "GSEA_negative_Output_C5", x = GSEA_negative_Output_C5)
Select Biological Processes
df <- GSEA_negative_Output_C5
colm <- "NAME"
selected_BP <- c("GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS" ,
"GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT",
"GOBP_SPHINGOLIPID_BIOSYNTHETIC_PROCESS",
"GOBP_LIPID_MODIFICATION",
"GOBP_MEMBRANE_LIPID_METABOLIC_PROCESS",
"GOBP_STEROID_BIOSYNTHETIC_PROCESS",
"GOBP_SPHINGOLIPID_METABOLIC_PROCESS",
"GOBP_LIPID_METABOLIC_PROCESS",
"GOBP_LIPID_OXIDATION",
"GOBP_STEROID_METABOLIC_PROCESS",
"GOBP_LIPID_BIOSYNTHETIC_PROCESS",
"GOBP_REGULATION_OF_NEUROTRANSMITTER_LEVELS",
"GOBP_CELLULAR_LIPID_METABOLIC_PROCESS",
"GOBP_GLANDULAR_EPITHELIAL_CELL_DIFFERENTIATION",
"GOBP_FATTY_ACID_METABOLIC_PROCESS",
"GOBP_NEGATIVE_REGULATION_OF_TRANSMEMBRANE_TRANSPORT",
"GOBP_HORMONE_TRANSPORT",
"GOBP_REGULATION_OF_LIPID_METABOLIC_PROCESS")
indexes <- match(selected_BP , df[,colm])
GSEA_negative_Output_C5_Selected <- df[indexes,]
ggplot(GSEA_negative_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 7)) +
xlab('Reactome Pathways')
#Proteins - GSEA - Software
#Prpare the gene set
Protein_input <- as.data.frame(MOFAobject@expectations$W$Protein[,'Factor1'])
colnames(Protein_input) <- 'Weights'
#Rename the Genes to remove _Genes
indexes<- which(grepl(pattern = "_Protein" , x = row.names(Protein_input) ))
Protein_toBeReplaced <- row.names(Protein_input)[indexes]
Protein_toBeReplaced <- gsub(pattern = "_Protein" , "" , Protein_toBeReplaced)
row.names(Protein_input)[indexes] <- Protein_toBeReplaced
#Add Genes Column
Protein_input$Proteins <-row.names(Protein_input)
Protein_input <- Protein_input[,c('Proteins' , 'Weights')]
Protein_input <- Protein_input %>% arrange(desc(Weights))
write.table(Protein_input , sep = "\t",file = 'Protein_input.rnk' , quote = FALSE , row.names = FALSE)
Reactome
Protein_positive_Output_Reactome <- read.delim("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/gsea_report_for_na_pos_1679677623172.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(Protein_positive_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
addWorksheet(OUT , 'Protein_pos_Reactome')
writeData(OUT, sheet = "Protein_pos_Reactome", x = Protein_positive_Output_Reactome)
Select Pathways
df <- Protein_positive_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_CELL_CYCLE_MITOTIC",
"REACTOME_CELL_CYCLE",
"REACTOME_CELL_CYCLE_CHECKPOINTS",
"REACTOME_MITOTIC_G1_PHASE_AND_G1_S_TRANSITION")
indexes <- match(selected_pathways , df[,'NAME'])
Protein_positive_Output_Reactome_Selected <- df[indexes,]
ggplot(Protein_positive_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5)) +
xlab('Reactome Pathways')
Protein_negative_Output_Reactome <- read.delim("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/gsea_report_for_na_neg_1679677623172.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(Protein_negative_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8)) +
xlab('Reactome Pathways')
addWorksheet(OUT , 'Protein_neg_Reactome')
writeData(OUT, sheet = "Protein_neg_Reactome", x = Protein_negative_Output_Reactome)
Select Pathways
df <- Protein_negative_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_SIGNALING_BY_INSULIN_RECEPTOR",
"REACTOME_INSULIN_RECEPTOR_SIGNALLING_CASCADE")
indexes <- match(selected_pathways , df[,'NAME'])
Protein_negative_Output_Reactome_Selected <- df[indexes,]
ggplot(Protein_negative_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 7)) +
xlab('Reactome Pathways')
C5goBp
Protein_positive_Output_C5 <- read.delim("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/gsea_report_for_na_pos_1679677577693.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(Protein_positive_Output_C5 , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8)) +
xlab('GO Biological Processes')
addWorksheet(OUT , 'Protein_pos_C5')
writeData(OUT, sheet = "Protein_pos_C5", x = Protein_positive_Output_C5)
Select Biological Processes
df <- Protein_positive_Output_C5
colm <- "NAME"
selected_BP <- c('GOBP_MITOTIC_CELL_CYCLE_PROCESS',
'GOBP_RESPONSE_TO_BACTERIUM',
'GOBP_RESPONSE_TO_CYTOKINE',
'GOBP_PHAGOCYTOSIS',
'GOBP_MITOTIC_CELL_CYCLE_CHECKPOINT_SIGNALING',
'GOBP_RESPONSE_TO_MOLECULE_OF_BACTERIAL_ORIGIN',
'GOBP_LYMPHOCYTE_APOPTOTIC_PROCESS',
'GOBP_REGULATION_OF_CELL_CELL_ADHESION',
'GOBP_B_CELL_ACTIVATION',
'GOBP_CYTOKINE_MEDIATED_SIGNALING_PATHWAY',
'GOBP_T_CELL_PROLIFERATION',
'GOBP_NEGATIVE_REGULATION_OF_PROTEIN_SERINE_THREONINE_KINASE_ACTIVITY',
'GOBP_REGULATION_OF_CELL_CYCLE_G2_M_PHASE_TRANSITION',
'GOBP_CELL_CELL_ADHESION',
'GOBP_POSITIVE_REGULATION_OF_I_KAPPAB_KINASE_NF_KAPPAB_SIGNALING')
indexes <- match(selected_BP , df[,colm])
Protein_positive_Output_C5_Selected <- df[indexes,]
ggplot(Protein_positive_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 7)) +
xlab('Reactome Pathways')
Protein_negative_Output_C5 <- read.delim("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/gsea_report_for_na_neg_1679677577693.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)
ggplot(Protein_negative_Output_C5 , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 6)) +
xlab('GO Biological Processes')
addWorksheet(OUT , 'Protein_neg_C5')
writeData(OUT, sheet = "Protein_neg_C5", x = Protein_negative_Output_C5)
Select Biological Processes
df <- Protein_negative_Output_C5
colm <- "NAME"
selected_BP <- c('GOBP_INSULIN_LIKE_GROWTH_FACTOR_RECEPTOR_SIGNALING_PATHWAY',
'GOBP_MAMMARY_GLAND_DEVELOPMENT',
'GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT',
'GOBP_RESPONSE_TO_STEROID_HORMONE',
'GOBP_LIPID_BIOSYNTHETIC_PROCESS',
'GOBP_CELLULAR_RESPONSE_TO_INSULIN_STIMULUS',
'GOBP_LIPID_METABOLIC_PROCESS',
'GOBP_INOSITOL_LIPID_MEDIATED_SIGNALING')
indexes <- match(selected_BP , df[,colm])
Protein_negative_Output_C5_Selected <- df[indexes,]
ggplot(Protein_negative_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val) , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 7)) +
xlab('Reactome Pathways')
#miRNA Overrepresentation Grouping by MOFA factor
cluster_f1<- cluster_samples(object = MOFAobject, factors = 1, k = 2)
cluster_f1_a<- as.data.frame (cluster_f1[["cluster"]])
colnames(cluster_f1_a)<- "group"
Z_data<- get_expectations(MOFAobject, variable = "Z")
# rename groups
cluster_f1_a$groups <- ifelse(cluster_f1_a$group== 1, "Negative", "Positive")
Expression matrix
mi_indx<- match (rownames(cluster_f1_a), colnames(miRNA.data))
mi_mtrx<- miRNA.data[miRNA_Features, mi_indx]
cond1="Negative"
cond2="Positive"
# perform deseq analysis
dds = DESeqDataSetFromMatrix( countData = mi_mtrx, colData = cluster_f1_a , design = ~ groups )
dds.run = DESeq(dds)
res=results(dds.run, contrast = c("groups" ,cond2, cond1) )
# remove nulls
res=res[complete.cases(res), ]
summary(res)
mi_res.df=as.data.frame(res)
write.table(mi_res.df, file = "res_mi_RNA.txt")
## 3. Selection for DEMS
dems.res_up =mi_res.df[mi_res.df$padj< 0.05 & mi_res.df$log2FoldChange > 0.5,]
dim(dems.res_up)
dems.up=dems.res_up[order(dems.res_up$padj), ]
dems_up.=rownames(dems.up)
dems.res_dwn =mi_res.df[mi_res.df$padj< 0.05 & mi_res.df$log2FoldChange < -0.5,]
dim(dems.res_dwn)
dems.dwn=dems.res_dwn[order(dems.res_dwn$padj), ]
dems.d=rownames(dems.res_dwn)
# export them to a miRNA list to start the miRNA over representation analysis.
save( dems_up., dems.d , mi_res.df, file= "mi_res_dems.RDATA")
write.table(dems_up., file = "mi_dems_up.txt", quote = FALSE , row.names = FALSE, col.names = FALSE)
write.table(dems.d, file = "mi_dems_dwn.txt", quote = FALSE , row.names = FALSE, col.names = FALSE)
TAM2-UP - Visualization
TAM2_OUTPUT_UP <- read.delim("miRNA_UP_TAM2.txt")
table(TAM2_OUTPUT_UP$Category)
##
## Cell Specificity Cluster Disease
## 101 41 420
## Family Function Tissue Specificity
## 29 105 21
## Transcription Factor
## 49
TAM2_OUTPUT_UP_Function <- TAM2_OUTPUT_UP %>% filter(Category == 'Function') %>% dplyr::select(Term , FDR) %>% arrange(FDR)%>% filter(FDR < 0.05)
ggplot(TAM2_OUTPUT_UP_Function , aes(x = reorder(Term , -FDR) , y = -log(FDR) , fill = -log(FDR))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'TAM2_OUTPUT_UP_Function')
writeData(OUT, sheet = "TAM2_OUTPUT_UP_Function", x = TAM2_OUTPUT_UP_Function)
Select Functions
df <- TAM2_OUTPUT_UP_Function
colm <- "Term"
selected_Fn <- c("Immune Response",
"Inflammation",
"Immune System(Xiao's Cell2010)",
"Cell Proliferation",
"Cell Cycle",
"Regulation of Akt Pathway",
"Response to Hypoxia",
"T-Cell Activation",
"Neutrophil Differentiation")
indexes <- match(selected_Fn , df[,colm])
TAM2_OUTPUT_UP_Function_Selected <- df[indexes,]
ggplot(TAM2_OUTPUT_UP_Function_Selected , aes(x = reorder(Term , -FDR) , y = -log(FDR) , fill = -log(FDR))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
TAM2-DOWN - Visualization
TAM2_OUTPUT_DOWN <- read.delim("miRNA_DOWN_TAM2.txt")
table(TAM2_OUTPUT_DOWN$Category)
##
## Cell Specificity Cluster Disease
## 68 30 348
## Family Function Tissue Specificity
## 23 92 17
## Transcription Factor
## 37
TAM2_OUTPUT_DOWN_Function <- TAM2_OUTPUT_DOWN %>% filter(Category == 'Function') %>% dplyr::select(Term , FDR) %>% arrange(FDR)%>% filter(FDR < 0.05)
ggplot(TAM2_OUTPUT_DOWN_Function , aes(x = reorder(Term , -FDR) , y = -log(FDR) , fill = -log(FDR))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'TAM2_OUTPUT_DOWN_Function')
writeData(OUT, sheet = "TAM2_OUTPUT_DOWN_Function", x = TAM2_OUTPUT_DOWN_Function)
Select Functions
df <- TAM2_OUTPUT_DOWN_Function
colm <- "Term"
selected_Fn <- c("Cell Cycle",
"Cell Differentiation",
"Epithelial-to-Mesenchymal Transition",
"Adipocyte Differentiation",
"Lipid Metabolism",
"Cell Proliferation",
"T-Cell Differentiation",
"Extracellular Matrix Remodeling",
"Inflammation",
"Hormone-mediated Signaling Pathway ")
indexes <- match(selected_Fn , df[,colm])
TAM2_OUTPUT_DOWN_Function_Selected <- df[indexes,]
ggplot(TAM2_OUTPUT_DOWN_Function_Selected , aes(x = reorder(Term , -FDR) , y = -log(FDR) , fill = -log(FDR))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
#lncRNA Overrepresentation
#Prpare the gene set
lnc_input <- as.data.frame(MOFAobject@expectations$W$lncRNA[,'Factor1'])
colnames(lnc_input) <- 'Weights'
lnc_input <- lnc_input %>% arrange(desc(Weights))
lnc_input$probe_ID <- row.names(lnc_input)
lnc_input <- lnc_input[,c('probe_ID','Weights')]
write.table(lnc_input , file = "lnc_arranged.tsv" , quote = FALSE , row.names = FALSE , col.names = TRUE)
expression matrix
ln_indx<- match (rownames(cluster_f1_a), colnames(lncRNA.data))
lnc_mtrx<- lncRNA.data[lncRNA_Features, ln_indx]
lnc_mtrx <- lnc_mtrx[complete.cases(lnc_mtrx),]
cond1="Negative"
cond2="Positive"
# perform deseq analysis
dds = DESeqDataSetFromMatrix( countData = lnc_mtrx, colData = cluster_f1_a , design = ~ groups )
dds.run = DESeq(dds)
res=results(dds.run, contrast = c("groups",cond2 ,cond1) )
# remove nulls
res=res[complete.cases(res), ]
summary(res)
res.df=as.data.frame(res)
write.table(res.df, file = "res_lncRNA.txt")
## 3. Selection for DEMS
lnc.res_up =res.df[res.df$padj< 0.05 & res.df$log2FoldChange > 1,]
dim(lnc.res_up)
lnc.up=lnc.res_up[order(lnc.res_up$padj), ]
lnc_up.=rownames(lnc.up)
lnc.res_dwn =res.df[res.df$padj< 0.05 & res.df$log2FoldChange < -1,]
dim(lnc.res_dwn)
lnc.dwn=lnc.res_dwn[order(lnc.res_dwn$padj), ]
lnc.d=rownames(lnc.res_dwn)
# export them to a miRNA list to start the miRNA over representation analysis.
save( lnc_up., lnc.d,res.df, file= "res_lnc.RDATA")
write.table(lnc_up., file = "lnc_up.txt", quote = FALSE , row.names = FALSE, col.names = FALSE)
write.table(lnc.d, file = "lnc.d.txt", quote = FALSE , row.names = FALSE, col.names = FALSE)
lnc_Enrichment Visualization
Panther - Reactome - UP
Panther_Reactome_UP <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/panter_lnc_reacto_up.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE, skip = 11)
## Rows: 32 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Reactome pathways, Client Text Box Input (over/under)
## dbl (6): Homo sapiens - REFLIST (20589), Client Text Box Input (251), Client...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_Reactome_UP)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_Reactome_UP <- Panther_Reactome_UP %>% dplyr::select(`Reactome pathways` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.05)
ggplot(Panther_Reactome_UP , aes(x = reorder(`Reactome pathways` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'Panther_Reactome_UP')
writeData(OUT, sheet = "Panther_Reactome_UP", x = Panther_Reactome_UP)
Select Reactome Pathways
df <- Panther_Reactome_UP
selected_Fn <- c("Nucleosome assembly (R-HSA-774815)",
"Deposition of new CENPA-containing nucleosomes at the centromere (R-HSA-606279)",
"Packaging Of Telomere Ends (R-HSA-171306)",
"DNA methylation (R-HSA-5334118)",
"Depurination (R-HSA-73927)",
"Activated PKN1 stimulates transcription of AR (androgen receptor) regulated genes KLK2 and KLK3 (R-HSA-5625886)",
"Cleavage of the damaged purine (R-HSA-110331)",
"Recognition and association of DNA glycosylase with site containing an affected purine (R-HSA-110330)",
"SIRT1 negatively regulates rRNA expression (R-HSA-427359)",
"Assembly of the ORC complex at the origin of replication (R-HSA-68616)",
"Activation of HOX genes during differentiation (R-HSA-5619507)")
indexes <- match(selected_Fn , df$`Reactome pathways`)
Panther_Reactome_UP_Selected <- df[indexes,]
ggplot(Panther_Reactome_UP_Selected , aes(x = reorder(`Reactome pathways` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 5))
Panther - Reactome - DOWN
Panther_Reactome_DOWN <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/panther_lnc_dwn_reactome.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE, skip = 11)
## Rows: 2493 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Reactome pathways, Client Text Box Input (over/under), Client Text ...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (228), Client...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_Reactome_DOWN)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_Reactome_DOWN <- Panther_Reactome_DOWN %>% dplyr::select(`Reactome pathways` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.05)
ggplot(Panther_Reactome_DOWN , aes(x = reorder(`Reactome pathways` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'Panther_Reactome_DOWN')
writeData(OUT, sheet = "Panther_Reactome_DOWN", x = Panther_Reactome_DOWN)
Select Reactome Pathways
df <- Panther_Reactome_DOWN
selected_Fn <- c("Insulin processing (R-HSA-264876)",
"Deubiquitination (R-HSA-5688426)",
"RUNX1 regulates transcription of genes involved in WNT signaling (R-HSA-8939256)",
"Defective HK1 causes hexokinase deficiency (HK deficiency) (R-HSA-5619056)",
"Degradation of the extracellular matrix (R-HSA-1474228)",
"Transcriptional Regulation by MECP2 (R-HSA-8986944)",
"Insulin processing (R-HSA-264876)",
"Interferon alpha/beta signaling (R-HSA-909733)")
indexes <- match(selected_Fn , df$`Reactome pathways`)
Panther_Reactome_DOWN_Selected <- df[indexes,]
ggplot(Panther_Reactome_DOWN_Selected , aes(x = reorder(`Reactome pathways` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
mammary gland epithelial cell proliferation (GO:0033598) hormone metabolic process (GO:0042445) Panther - BP - UP
Panther_BP_UP <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/panther_lncRNA_bp_up.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE, skip = 11)
## Rows: 15681 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): GO biological process complete, Client Text Box Input (over/under),...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (251), Client...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_BP_UP)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_BP_UP <- Panther_BP_UP %>% dplyr::select(`GO biological process complete` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.01)
ggplot(Panther_BP_UP , aes(x = reorder(`GO biological process complete` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'Panther_BP_UP')
writeData(OUT, sheet = "Panther_BP_UP", x = Panther_BP_UP)
Panther - BP - DOWN
Panther_BP_DOWN <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/panther_lnc_dwn_GObp.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE, skip = 11)
## Rows: 15681 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): GO biological process complete, Client Text Box Input (over/under),...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (228), Client...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_BP_DOWN)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_BP_DOWN <- Panther_BP_DOWN %>% dplyr::select(`GO biological process complete` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.01)
ggplot(Panther_BP_DOWN , aes(x = reorder(`GO biological process complete` , -P.val) , y = -log(P.val) , fill = -log(P.val))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'Panther_BP_DOWN')
writeData(OUT, sheet = "Panther_BP_DOWN", x = Panther_BP_DOWN)
Topfun - Enrich - up
topfun_enrich_UP <- read.delim('Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/toppfun_lnc_up_enrich.txt')
table(topfun_enrich_UP$Category)
##
## Coexpression Atlas Computational Cytoband
## 117 53 79
## Disease Gene Family GO: Biological Process
## 1 4 10
## GO: Cellular Component GO: Molecular Function Interaction
## 8 1 62
## Mouse Phenotype Pathway Pubmed
## 42 12 2137
## ToppCell Atlas
## 262
topfun_enrich_UP_BP <- topfun_enrich_UP %>% filter(Category == "GO: Biological Process") %>% dplyr::select(Name , p.value)
ggplot(topfun_enrich_UP_BP , aes(x = reorder(`Name` , -p.value) , y = -log(p.value) , fill = -log(p.value))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'topfun_enrich_UP_BP')
writeData(OUT, sheet = "topfun_enrich_UP_BP", x = topfun_enrich_UP_BP)
Topfun - Enrich - DOWN
topfun_enrich_DOWN <- read.delim('Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/topfun_lnc_dwn.txt')
table(topfun_enrich_DOWN$Category)
##
## Coexpression Coexpression Atlas Computational
## 2 44 10
## Cytoband Gene Family GO: Biological Process
## 3 1 25
## Interaction Pathway Pubmed
## 19 1 326
## ToppCell Atlas
## 5
topfun_enrich_DOWN_BP <- topfun_enrich_DOWN %>% filter(Category == "GO: Biological Process") %>% dplyr::select(Name , p.value)
ggplot(topfun_enrich_DOWN_BP , aes(x = reorder(`Name` , -p.value) , y = -log(p.value) , fill = -log(p.value))) +
geom_bar(stat = 'identity') +
coord_flip() +
theme(axis.text = element_text(size = 8))
addWorksheet(OUT , 'topfun_enrich_DOWN_BP')
writeData(OUT, sheet = "topfun_enrich_DOWN_BP", x = topfun_enrich_DOWN_BP)
#Save Final Results
saveWorkbook(OUT, "Enrichment_Results/Final_Tables/Final_Enrichment.xlsx")
library(survival)
library(survminer)
## Loading required package: ggpubr
## Warning: package 'ggpubr' was built under R version 4.2.2
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
Total_pheno$OS.time <- as.numeric(Total_pheno$OS.time)
Total_pheno$deceased = Total_pheno$vital_status == "Dead"
Total_pheno$Factor1 <-ifelse(Zmatrix$group1.Factor1 > 0 , "Pos" , "Neg")
fit =survival::survfit(survival::Surv(OS.time,deceased ) ~ Factor1 , data = Total_pheno)
survminer::ggsurvplot(fit, data=Total_pheno,
pval=T,
risk.table=T,
risk.table.col="strata",
title = stringr::str_glue('Level'),
legend.title = "no. of Samples",
xlab = "Time by Days"
)
#Discussion The best MOFA factor was factor (1), which has the most
variance contribution from each omic, followed by factor (3). There is
no correlation detected between the generated factors. Factor (1)
clearly differentiated between breast cancer subtypes. The basal was
positive to factor one, while the luminal was negative.
#a. Features positive to factor one The resulted enriched pathways from the four omics were mainly related to four main terms:
(In this section we are representing examples of commonly enriched pathways, biological processes or Functions related to each module by exploring the highly enriched features)
#1. Immune modulation. #mRNA #First Pathway: REACTOME_INTERFERON_GAMMA_SIGNALING
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_INTERFERON_GAMMA_SIGNALING_241.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('GBP1', 'TRIM29', "GBP5" , 'VCAM1', 'HLA-E', 'HLA-A','SOCS3'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "GBP1",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('GBP1', 'TRIM29', "GBP5" , 'VCAM1', 'HLA-E', 'HLA-A','SOCS3') ,
scale = F # Scale weights from -1 to 1
)
#Second Pathway: REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM_273.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('FSCN1','SOD2_Genes', 'MSN', 'ANXA1_Genes','PSME4', 'TRIM29', 'FLNA', 'VIM'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "FSCN1",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('FSCN1','SOD2_Genes', 'MSN', 'ANXA1_Genes','PSME4', 'TRIM29', 'FLNA', 'VIM') ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#First Biological Process:
GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON_99.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('GBP1', 'ACTR3', "CX3CL1" , 'VIM', 'SYNCRIP', 'ASS1','GAPDH_Genes'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "ACTR3",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('GBP1', 'ACTR3', "CX3CL1" , 'VIM', 'SYNCRIP', 'ASS1','GAPDH_Genes') ,
scale = F # Scale weights from -1 to 1
)
#Second Biological Process: GOBP_HUMORAL_IMMUNE_RESPONSE
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_HUMORAL_IMMUNE_RESPONSE_119.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('PI3', 'S100A9', "KRT6A" , 'KLK7', 'SLPI', 'GAPDH_Genes','KLK5'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "S100A9",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('PI3', 'S100A9', "KRT6A" , 'KLK7', 'SLPI', 'GAPDH_Genes','KLK5') ,
scale = F # Scale weights from -1 to 1
)
#Protein #First Biological Process: GOBP_RESPONSE_TO_CYTOKINE
knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_RESPONSE_TO_CYTOKINE_37.png")
plot_data_scatter(MOFAobject,
view = "Protein",
factor = 1,
features = c('LYN', 'SIRPA', "TP53" , 'SYK', 'CD274', 'CD44_Protein','PAX6' , 'KIT_Protein' , "TFRC_Protein"),
sign = "positive",
color_by = "IMS"
) + labs(y="Protein expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "LYN",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 1,
manual = c('LYN', 'SIRPA', "TP53" , 'SYK', 'CD274', 'CD44_Protein','PAX6' , 'KIT_Protein' , "TFRC_Protein") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Second Pathway: REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM
knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_LYMPHOCYTE_APOPTOTIC_PROCESS_29.png")
plot_data_scatter(MOFAobject,
view = "Protein",
factor = 1,
features = c('IDO1_Protein','CASP7', 'AURKB', 'TP53','CHEK2'),
sign = "positive",
color_by = "IMS"
) + labs(y="Protein expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "AURKB",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 1,
manual = c('IDO1_Protein','CASP7', 'AURKB', 'TP53','CHEK2') ,
scale = F # Scale weights from -1 to 1
)
#miRNA #First Function: Immune Response
plot_data_scatter(MOFAobject,
view = "miRNA",
factor = 1,
features = c('hsa-mir-186','hsa-mir-203a','hsa-mir-362','hsa-mir-9-1',
'hsa-mir-92a-1','hsa-mir-203b','hsa-mir-146b','hsa-mir-196b',
'hsa-mir-9-2','hsa-mir-210'),
sign = "positive",
color_by = "IMS"
) + labs(y="miRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "hsa-mir-186",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 1,
manual = c('hsa-mir-186','hsa-mir-203a','hsa-mir-362','hsa-mir-9-1',
'hsa-mir-92a-1','hsa-mir-203b','hsa-mir-146b','hsa-mir-196b',
'hsa-mir-9-2','hsa-mir-210') ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Second Function: T-Cell Activation
plot_data_scatter(MOFAobject,
view = "miRNA",
factor = 1,
features = c("hsa-mir-223","hsa-mir-146a","hsa-mir-18a","hsa-mir-155","hsa-mir-31"),
sign = "positive",
color_by = "IMS"
) + labs(y="miRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "hsa-mir-223",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 1,
manual = c("hsa-mir-223","hsa-mir-146a","hsa-mir-18a","hsa-mir-155","hsa-mir-31") ,
scale = F # Scale weights from -1 to 1
)
#2. Cell cycle and Mitotic division. #mRNA #REACTOME_CELL_CYCLE_MITOTIC
#Protein #First Pathway: REACTOME_CELL_CYCLE_MITOTIC
knitr::include_graphics("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/enplot_REACTOME_CELL_CYCLE_MITOTIC_161.png")
plot_data_scatter(MOFAobject,
view = "Protein",
factor = 1,
features = c('CCNE1', 'LYN', "FOXM1" , 'CCNB1', 'CDK1', 'PLK1','AURKB' , 'TP53' , "AURKA"),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "CCNE1",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 1,
manual = c('CCNE1', 'LYN', "FOXM1" , 'CCNB1', 'CDK1', 'PLK1','AURKB' , 'TP53' , "AURKA") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Second Pathway: REACTOME_CELL_CYCLE_CHECKPOINTS
knitr::include_graphics("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/enplot_REACTOME_CELL_CYCLE_CHECKPOINTS_165.png")
#miRNA #First Function: Cell Cycle
plot_data_scatter(MOFAobject,
view = "miRNA",
factor = 1,
features = c("hsa-mir-24-1","hsa-mir-19b-1","hsa-mir-9-1","hsa-mir-221","hsa-mir-20a","hsa-mir-17","hsa-mir-92a-1","hsa-mir-19b-2","hsa-mir-31"),
sign = "positive",
color_by = "IMS"
) + labs(y="miRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = 'hsa-mir-24-1',
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 1,
manual = c("hsa-mir-24-1","hsa-mir-19b-1","hsa-mir-9-1","hsa-mir-221","hsa-mir-20a","hsa-mir-17","hsa-mir-92a-1","hsa-mir-19b-2","hsa-mir-31") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#3. Cell migration #mRNA #First Biological Process:
GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION_113.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('CXCL10', 'LGMN', "CALR" , 'CCL5', 'CD47', 'APP','S100A7'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "CXCL10",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('CXCL10', 'LGMN', "CALR" , 'CCL5', 'CD47', 'APP','S100A7') ,
scale = F # Scale weights from -1 to 1
)
#Protein #First Biological Process
#miRNA #First Function:
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_INFLUENZA_INFECTION_265.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('KPNA4', 'RPS7', "KPNA2" , 'CALR', 'RPS27A', 'IPO5','RPS19', "RAN" , "RPS12"),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "KPNA4",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('KPNA4', 'RPS7', "KPNA2" , 'CALR', 'RPS27A', 'IPO5','RPS19', "RAN" , "RPS12") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Second Pathway: GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE_83.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('PI3','S100A9', 'KRT6A', 'KLK7','SLPI', 'GAPDH_Genes', 'KLK5', 'CXCL10'),
sign = "positive",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('PI3','S100A9', 'KRT6A', 'KLK7','SLPI', 'GAPDH_Genes', 'KLK5', 'CXCL10') ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Protein #First Biological Process: GOBP_RESPONSE_TO_BACTERIUM
knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_RESPONSE_TO_BACTERIUM_3.png")
plot_data_scatter(MOFAobject,
view = "Protein",
factor = 1,
features = c('LYN', 'IDO1_Protein', "SIRPA" , 'CASP7', 'SOD2_Protein', 'SYK','CD274' , 'PRKCA' , "CD4"),
sign = "positive",
color_by = "IMS"
) + labs(y="Protein expression")
plot_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 1,
manual = c('LYN', 'IDO1_Protein', "SIRPA" , 'CASP7', 'SOD2_Protein', 'SYK','CD274' , 'PRKCA' , "CD4") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#miRNA #First Function: Response to Hypoxia
plot_data_scatter(MOFAobject,
view = "miRNA",
factor = 1,
features = c("hsa-mir-137","hsa-mir-210","hsa-mir-138-1","hsa-mir-155","hsa-mir-138-2","hsa-mir-18a","hsa-mir-196b"),
sign = "positive",
color_by = "IMS"
) + labs(y="miRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = 'hsa-mir-138-2',
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 1,
manual = c("hsa-mir-137","hsa-mir-210","hsa-mir-138-1","hsa-mir-155","hsa-mir-138-2","hsa-mir-18a","hsa-mir-196b") ,
scale = F # Scale weights from -1 to 1
)
#b. Features Negative to factor one #1. Lipid metabolism #mRNA #First Pathway: REACTOME_SPHINGOLIPID_METABOLISM
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_SPHINGOLIPID_METABOLISM_283.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('DEGS2', 'UGCG', "CERS4" , 'ASAH1', 'ARSD', 'CERS6','SPTLC2'),
sign = "negative",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "DEGS2",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('DEGS2', 'UGCG', "CERS4" , 'ASAH1', 'ARSD', 'CERS6','SPTLC2') ,
scale = F # Scale weights from -1 to 1
)
#Second Pathway: REACTOME_METABOLISM_OF_LIPIDS
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_METABOLISM_OF_LIPIDS_311.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('SLC44A4', 'DEGS2', "INPP4B_Genes" , 'UGCG', 'STARD10', 'CYP4B1','HSD17B4', "HACD3" , "ELOVL5"),
sign = "negative",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "SLC44A4",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('SLC44A4', 'DEGS2', "INPP4B_Genes" , 'UGCG', 'STARD10', 'CYP4B1','HSD17B4', "HACD3" , "ELOVL5") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#First Biologoical Process: GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS_121.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('DEGS2', 'UGCG', "HACD3" , 'ELOVL5', 'SCCPDH', 'CERS4','ASAH1'),
sign = "negative",
color_by = "IMS"
) + labs(y="mRNA expression")
#Protein #First Biological Process: GOBP_LIPID_METABOLIC_PROCESS
#miRNA #First Function: Lipid Metabolism
plot_data_scatter(MOFAobject,
view = "miRNA",
factor = 1,
features = c("hsa-mir-125a","hsa-mir-29c","hsa-mir-33b","hsa-mir-103a-2","hsa-mir-103a-1","hsa-mir-10b","hsa-mir-196a-2","hsa-mir-196a-1","hsa-mir-375"),
sign = "negative",
color_by = "IMS"
) + labs(y="miRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = 'hsa-mir-29c',
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "miRNA",
factor = c(1),
nfeatures = 1,
manual = c("hsa-mir-125a","hsa-mir-29c","hsa-mir-33b","hsa-mir-103a-2","hsa-mir-103a-1","hsa-mir-10b","hsa-mir-196a-2","hsa-mir-196a-1","hsa-mir-375") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
2. Hormone related pathways #mRNA #First Biological Process:
GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT
knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT_123.png")
plot_data_scatter(MOFAobject,
view = "Genes",
factor = 1,
features = c('ESR1_Genes', 'ERBB4', "GATA3_Genes" , 'AR_Genes', 'PGR_Genes', 'ZNF703','PRLR' , 'CCND1_Genes' , "TBX3"),
sign = "negative",
color_by = "IMS"
) + labs(y="mRNA expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "ESR1_Genes",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Genes",
factor = c(1),
nfeatures = 1,
manual = c('ESR1_Genes', 'ERBB4', "GATA3_Genes" , 'AR_Genes', 'PGR_Genes', 'ZNF703','PRLR' , 'CCND1_Genes' , "TBX3") ,
scale = F # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors
#Protein #First Biological Process:
GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT
knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT_43.png")
plot_data_scatter(MOFAobject,
view = "Protein",
factor = 1,
features = c('ESR1_Protein', 'GATA3_Protein', "AR_Protein" , 'PGR_Protein', 'MAPK1_Protein'),
sign = "negative",
color_by = "IMS"
) + labs(y="Protein expression")
plot_factor(MOFAobject,
factors = c(1),
color_by = "ESR1_Protein",
add_violin = T,
group_by = "IMS",
show_missing = T,
add_boxplot = T,
scale = T,
dot_size = 3,
dodge = T,
stroke = 0.1
)
plot_weights(MOFAobject,
view = "Protein",
factor = c(1),
nfeatures = 1,
manual = c('ESR1_Protein', 'GATA3_Protein', "AR_Protein" , 'PGR_Protein', 'MAPK1_Protein') ,
scale = F # Scale weights from -1 to 1
)
sessionInfo()
## R version 4.2.0 (2022-04-22 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] survminer_0.4.9 ggpubr_0.5.0
## [3] survival_3.3-1 biomaRt_2.52.0
## [5] rstatix_0.7.2 data.table_1.14.6
## [7] MOFAdata_1.14.0 MOFA2_1.6.0
## [9] openxlsx_4.2.5.2 ggupset_0.3.0
## [11] ComplexUpset_1.3.3 TCGAbiolinks_2.24.3
## [13] ggrepel_0.9.2 umap_0.2.9.0
## [15] ggfortify_0.4.15 ComplexHeatmap_2.12.1
## [17] forcats_1.0.0 stringr_1.5.0
## [19] dplyr_1.0.10 purrr_0.3.5
## [21] tidyr_1.2.1 tibble_3.1.8
## [23] ggplot2_3.4.0 tidyverse_1.3.2
## [25] DESeq2_1.36.0 SummarizedExperiment_1.26.1
## [27] MatrixGenerics_1.8.1 matrixStats_0.63.0
## [29] GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
## [31] IRanges_2.30.1 S4Vectors_0.34.0
## [33] vsn_3.64.0 Biobase_2.56.0
## [35] BiocGenerics_0.42.0 readr_2.1.3
## [37] readxl_1.4.1
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.26
## [3] tidyselect_1.2.0 RSQLite_2.2.20
## [5] AnnotationDbi_1.58.0 BiocParallel_1.30.4
## [7] Rtsne_0.16 munsell_0.5.0
## [9] codetools_0.2-18 preprocessCore_1.58.0
## [11] withr_2.5.0 colorspace_2.0-3
## [13] filelock_1.0.2 highr_0.10
## [15] knitr_1.42 rstudioapi_0.14
## [17] ggsignif_0.6.4 labeling_0.4.2
## [19] GenomeInfoDbData_1.2.8 KMsurv_0.1-5
## [21] mnormt_2.1.1 bit64_4.0.5
## [23] farver_2.1.1 pheatmap_1.0.12
## [25] rhdf5_2.40.0 downloader_0.4
## [27] basilisk_1.8.1 vctrs_0.5.1
## [29] generics_0.1.3 xfun_0.37
## [31] timechange_0.2.0 BiocFileCache_2.4.0
## [33] markdown_1.5 R6_2.5.1
## [35] doParallel_1.0.17 clue_0.3-63
## [37] locfit_1.5-9.7 bitops_1.0-7
## [39] rhdf5filters_1.8.0 cachem_1.0.6
## [41] reshape_0.8.9 DelayedArray_0.22.0
## [43] assertthat_0.2.1 vroom_1.6.1
## [45] scales_1.2.1 googlesheets4_1.0.1
## [47] gtable_0.3.1 affy_1.74.0
## [49] rlang_1.0.6 genefilter_1.78.0
## [51] GlobalOptions_0.1.2 splines_4.2.0
## [53] gargle_1.3.0 broom_1.0.3
## [55] abind_1.4-5 BiocManager_1.30.19
## [57] yaml_2.3.6 reshape2_1.4.4
## [59] modelr_0.1.10 backports_1.4.1
## [61] gridtext_0.1.5 tools_4.2.0
## [63] psych_2.2.9 affyio_1.66.0
## [65] ellipsis_0.3.2 jquerylib_0.1.4
## [67] RColorBrewer_1.1-3 Rcpp_1.0.9
## [69] plyr_1.8.8 progress_1.2.2
## [71] zlibbioc_1.42.0 RCurl_1.98-1.9
## [73] basilisk.utils_1.8.0 prettyunits_1.1.1
## [75] openssl_2.0.5 GetoptLong_1.0.5
## [77] cowplot_1.1.1 zoo_1.8-11
## [79] haven_2.5.1 cluster_2.1.3
## [81] fs_1.6.0 magrittr_2.0.3
## [83] RSpectra_0.16-1 circlize_0.4.15
## [85] reprex_2.0.2 googledrive_2.0.0
## [87] hms_1.1.2 patchwork_1.1.2
## [89] TCGAbiolinksGUI.data_1.16.0 evaluate_0.20
## [91] xtable_1.8-4 XML_3.99-0.13
## [93] gridExtra_2.3 shape_1.4.6
## [95] compiler_4.2.0 crayon_1.5.2
## [97] htmltools_0.5.4 mgcv_1.8-40
## [99] tzdb_0.3.0 ggtext_0.1.2
## [101] geneplotter_1.74.0 lubridate_1.9.1
## [103] DBI_1.1.3 corrplot_0.92
## [105] dbplyr_2.3.0 rappdirs_0.3.3
## [107] Matrix_1.5-3 car_3.1-1
## [109] cli_3.5.0 parallel_4.2.0
## [111] km.ci_0.5-6 pkgconfig_2.0.3
## [113] dir.expiry_1.4.0 xml2_1.3.3
## [115] foreach_1.5.2 annotate_1.74.0
## [117] bslib_0.4.2 XVector_0.36.0
## [119] rvest_1.0.3 digest_0.6.30
## [121] Biostrings_2.64.1 rmarkdown_2.20
## [123] cellranger_1.1.0 survMisc_0.5.6
## [125] uwot_0.1.14 curl_5.0.0
## [127] commonmark_1.8.1 rjson_0.2.21
## [129] nlme_3.1-157 lifecycle_1.0.3
## [131] jsonlite_1.8.3 Rhdf5lib_1.18.2
## [133] carData_3.0-5 askpass_1.1
## [135] limma_3.52.4 fansi_1.0.3
## [137] pillar_1.8.1 lattice_0.20-45
## [139] GGally_2.1.2 KEGGREST_1.36.3
## [141] fastmap_1.1.0 httr_1.4.4
## [143] glue_1.6.2 zip_2.2.2
## [145] png_0.1-8 iterators_1.0.14
## [147] bit_4.0.5 stringi_1.7.8
## [149] sass_0.4.5 HDF5Array_1.24.2
## [151] blob_1.2.3 memoise_2.0.1